Learning Syntactic Categories Using Paradigmatic Representations of Word Context
نویسندگان
چکیده
We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.
منابع مشابه
Word Context and Token Representations from Paradigmatic Relations and Their Application to Part-of-Speech Induction
Representation of words as dense real vectors in the Euclidean space provides an intuitive definition of relatedness in terms of the distance or the angle between one another. Regions occupied by these word representations reveal syntactic and semantic traits of the words. On top of that, word representations can be incorporated in other natural language processing algorithms as features. In th...
متن کاملLearning grammatical categories using paradigmatic representations: Substitute words for language acquisition
Learning word categories is a fundamental task in language acquisition. Previous studies show that co-occurrence patterns of preceding and following words are essential to group words into categories. However, the neighboring words, or frames, are rarely repeated exactly in the data. This creates data sparsity and hampers learning for frame based models. In this work, we propose a paradigmatic ...
متن کاملResearch Interests João Sedoc Description of Work
Presently my main research interest is the development and application of machine learning and statistical techniques toward natural language processing. The representation of words using vector space models is widely used for a variety of natural language processing (NLP) tasks. The two main word embedding categories are cluster based and dense representations. Brown Clustering and other hiera...
متن کاملInvestigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings
The number of word embedding models is growing every year. Most of them are based on the co-occurrence information of words and their contexts. However, it is still an open question what is the best definition of context. We provide a systematical investigation of 4 different syntactic context types and context representations for learning word embeddings. Comprehensive experiments are conducte...
متن کاملLearning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations
Vector space representation of words has been widely used to capture fine-grained linguistic regularities, and proven to be successful in various natural language processing tasks in recent years. However, existing models for learning word representations focus on either syntagmatic or paradigmatic relations alone. In this paper, we argue that it is beneficial to jointly modeling both relations...
متن کامل